gd algorithm
Quantum Maximum Entropy Inference and Hamiltonian Learning
Gao, Minbo, Ji, Zhengfeng, Wei, Fuchao
Maximum entropy inference is a widely used method in machine learning, particularly in the context of graphical models (McCallum et al., 2000; Kindermann & Snell, 1980; Ackley et al., 1985; Bresler, 2015; Hamilton et al., 2017) and natural language processing (Berger et al., 1996). In graphical models, it is known as the backward mapping, the problem of computing the model parameters from the marginal information (Wainwright & Jordan, 2007). The inverse problem of estimating marginal parameters from the model parameters is called the forward mapping. Maximum entropy inference is also a core concept in statistical physics (Jaynes, 1957) known as the Jaynes' principle which links statistical mechanics and information theory. The Hammersley-Clifford theorem establishes that, in the classical case, any positive probability distribution satisfying the local Markov property can be represented as a Gibbs distribution (Lafferty et al., 2001).
High Efficiency Inference Accelerating Algorithm for NOMA-based Mobile Edge Computing
Yuan, Xin, Li, Ning, Zhang, Tuo, Li, Muqing, Chen, Yuwen, Ortega, Jose Fernan Martinez, Guo, Song
-- Splitting the inference model between device, edge server, and cloud can improve the performance of EI greatly. Additionally, the non - orthogonal multiple access (NOMA), which is the key supporting technologies of B5G/6G, ca n achieve massive connections and high spectrum efficiency. Motivated by the benefits of NOMA, integrating NOMA with model split in MEC to reduce the inference latency further becomes attractive. However, the NOMA based communication during split inference has not been properly considered in previous works. Therefore, in this paper, we integrate the NOMA into split inference in MEC, and p ropose the effective communication and computing resource allocation algorithm to accelerat e the model inference at edge . Specifically, when the mobile user has a large model inference task needed to be calculated in the NOMA - based MEC, it will take the energy consumption of both device and edge server and the inference latency into account to find the optimal model split s trategy, subchannel allocation strategy (uplink and downlink), and transmission power allocation strategy (uplink and downlink). Since the minimum inference delay and energy consumption cannot be satisfied simultaneously, and the variables of subchannel al location and model split are discrete, the gradient descent (GD) algorithm is adopted to find the optimal tradeoff between them. Moreover, the loop iteration GD approach (Li - GD) is proposed to reduce the complexity of GD algorithm that caused by the parame ter discrete. Additionally, the properties of the proposed algorithm are also investigated, which demonstrate the effectiveness of the proposed algorithms. The artificial intelligence has been widely used and changed our life greatly, such as metaverse [1 - 2], auto matic driving [2 - 4], image generation [5], etc. However, since the AI model is always large for achieving high accuracy, the computing resource that needed for these models are huge. Therefore, it is inappropriate to deploy these AI models on the mobile de vices, such as mobile phones and vehicles, in which the computing resource is quite limited.
A Study of Condition Numbers for First-Order Optimization
Guille-Escuret, Charles, Goujaud, Baptiste, Girotti, Manuela, Mitliagkas, Ioannis
The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.